Search CORE

378 research outputs found

A COMPUTATIONALLY EFFICIENT METHOD FOR DETERMINING SIGNIFICANCE IN INTERVAL MAPPING OF QUANTITATIVE TRAIT LOCI

Author: Nettleton Dan
Publication venue: 'New Prairie Press'
Publication date: 25/04/1999
Field of study

This paper provides a brief introduction to the mapping of quantitative trait loci (QTL). An example on mapping QTL for root thickness in rice is presented to illustrate popular statistical methods used in QTL mapping. Interval mapping is used in conjunction with permutation testing techniques to detect significant associations between genetic positions and quantitative traits while controlling overall type I error rate. A review of a recent technique that can greatly reduce the computational expense of permutation testing in QTL mapping is discussed. Theory is provided for an extension of recent results that may lead to more powerful methods of QTL mapping through permutation testing

Kansas State University

Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis

Author: Maitra Ranjan
Melnykov Volodymyr
Nettleton Dan
Publication venue: Iowa State University Digital Repository
Publication date: 01/05/2011
Field of study

Two-dimensional gel electrophoresis is a biochemical technique that combines isoelectric focusing and SDS-polyacrylamide gel technology to achieve simultaneous separation of protein mixtures on the basis of isoelectric point and molecular weight. Upon staining, each protein on a gel can be characterized by an intensity measurement that reflects its abundance in the mixture. These can then conceptually be used to determine which proteins are differentially expressed under different experimental conditions. We propose an EM approach to identify differentially expressed proteins using an inferential strategy that accounts for uncertainty in matching spots to proteins across gels. The underlying mixture model has trivariate Gaussian components. The application of the EM is however, not straightforward, with the main difficulty lying in the E-step calculations because of the dependent structure of proteins within each gel. Therefore, the usual model-based clustering approach is inapplicable, and an MCMC approach is employed. Through data-based simulation, we demonstrate that our proposed method effectively accounts for uncertainty in spot matching and more successfully distinguishes differentially and non-differentially expressed proteins than a naïve t-test which ignores uncertainty in spot matching

Digital Repository @ Iowa State University (ISU)

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Author: Nettleton Dan
Wang Yan
Wu Huaiqing
Publication venue
Publication date: 28/10/2023
Field of study

We establish stability of random forests under the mild condition that the squared response (

Y^2

) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed

Y^2

. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when

Y

is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.Comment: NeurIPS 202

arXiv.org e-Print Archive